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Cancer progression is an evolutionary process that is driven by mutation and selection in a population of tumor 
cells. We discuss mathematical models of cancer progression, starting from traditional multistage theory. Each 
stage is associated with the occurrence of genetic alterations and their fixation in the population. We describe the 
accumulation of mutations using conjunctive Bayesian networks, an exponential family of waiting time models 
in which the occurrence of mutations is constrained to a partial temporal order. Two opposing limit cases arise 
if mutations either follow a linear order or occur independently. We derive exact analytical expressions for the 
waiting time until a specific number of mutations have accumulated in these limit cases as well as for the general 
conjunctive Bayesian network. Finally, we analyze a stochastic population genetics model that explicitly accounts 
for mutation and selection. In this model, waves of clonal expansions sweep through the population at equidistant 
intervals. We present an approximate analytical expression for the waiting time in this model and compare it to 
the results obtained for the conjunctive Bayesian networks. 
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I. INTRODUCTION 

Cancer is a genetic disease that develops as the resuh of 
mutations in specific genes. When these genes work normally, 
they control the growth of cells in the body. Cancer cells have 
lost the normal cooperative behavior of cells in multicellular 
organisms resulting in increased proliferation. Tumor develop- 
ment starts from a single genetically altered cell and proceeds 
by successive clonal expansions of cells that have acquired ad- 
ditional advantageous mutations. The progression of cancer 
is characterized by the accumulation of these genetic changes 
(Cairns, 1975; Crespi and Summers, 2005; Merlo et ai, 2006; 
Michor et ai, 2004; Nowell, 1976). 

Many oncogenes and tumor suppressor genes have been 
identified that contribute to tumorigenesis (Futreal et ai, 
2004). In general, the mutational patterns of cancer cells vary 
greatly, not only among cancer types, but also among individ- 
ual tumors of the same type. Some of this genetic variation 
might be due to the fact that all cancer cells need to acquire 
certain functional changes, the hallmarks of cancer, and most 
of these functions are accomplished by several gene products 
acting together in signaling pathways (Hanahan and Weinberg, 
2000; Vogelstein and Kinzler, 2004). Thus, many different ge- 
netic alterations can have similar phenotypic effects. 

The incidence of sporadic cancer indicates that the underly- 
ing events are stochastic and that, in general, several steps are 
necessary. Therefore, a random processes approach appears to 
be an appropriate modeling strategy. The progression stages 
are generally not observable on a molecular level in vivo, and 
in a clinical setting, patients are typically diagnosed at the fi- 
nal stages of tumorigenesis. Mathematical modeling plays an 
important role in cancer research today, because it can be used 
to reconstruct and to analyze the evolutionary process driving 
cancer progression (Anderson and Quaranta, 2008). 

Models of tumorigenesis have been proposed early on to ex- 
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plain cancer incidence data (Armitage and Doll, 1954; Knud- 
son, 1971; Nordling, 1953). These models assume that cancer 
is a stochastic multistep process with small transition rates and 
they have been further developed into the multistage theory 
of cancer (Frank, 2007; Jones et ai, 2008a; Moolgavkar and 
Luebeck, 1992). The tumor stages may be defined by specific 
mutations, by the number of mutations, by epigenetic changes, 
by functional alterations, or by histological properties. Since 
cancer progression is an evolutionary process, population ge- 
netics models are used extensively to describe tumorigene- 
sis (Beerenwinkel et al., 2007a; Durrett et ai, 2008; Nowak, 
2006; Schinazi, 2006; Wodarz and Komarova, 2005). Various 
deterministic and stochastic models have been proposed, some 
of which address specific questions, such as the dynamics of 
tumor suppressor genes (Iwasa et ai, 2005), genetic instabil- 
ity (Nowak et ai, 2006), or tissue architecture (Nowak et ai, 
2003). 

As more and more genetic data from cancer cells become 
available from comprehensive studies (Jones et ai, 2008b; Ley 
et ai, 2008; Parsons et ai, 2008; Sjoblom et ai, 2006; Wood 
et ai, 2007) and through databases (Baudis, 2007; Futreal 
et ai, 2004; Mitelman et ai, 2008), one can also start inves- 
tigating the dependencies between genetic events using sta- 
tistical models. In view of multistage theory, tumors pro- 
ceed through distinct stages, which can be characterized by 
the appearance of certain mutations. Particular attention has 
been paid to inferring the order of genetic alterations. Several 
graphical models have been developed for this purpose and 
applied to various cancer types (Beerenwinkel and Sullivant, 
2008; Beerenwinkel et ai, 2006, 2007b; Desper et ai, 1999; 
Hjelm et ai, 2006; Radmacher et ai, 2001; Rahnenfiihrer 
era/., 2005). 

A quantitative understanding of carcinogenesis can help de- 
veloping new diagnostic and prognostic markers. Today a va- 
riety of univariate genetic markers is known (Sidransky, 2002), 
most of them comprising well-known oncogenes or tumor sup- 
pressors. Because of the diverse genetic nature of cancer, 
markers measuring the accumulation of several mutations, i.e., 
the progression of cancer, may improve existing ones. Here, 
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normal tissue > metastases 

FIG. 1 The multistage model. Cancer is assumed to develop in a 
linear process consisting of k stages. The progression steps mark the 
major transitions of the tissue from normal to cancerous and even- 
tually to metastasizing. At each transitions {j — !)—>_/ the waiting 
time follows an exponential distribution with parameter iij giving rise 
to the model defined by Eq. (1). 



we investigate the dynamics of cancer progression as a func- 
tion of transition rates and of order constraints on the genetic 
events. The expected waiting time can be regarded as a mea- 
sure of genetic progression to cancer (Rahnenfiihrer et ai, 
2005). 

In Section II, we introduce the general stochastic multistep 
process and present an equivalent description in terms of or- 
dinary differential equations (ODEs). At this abstract level 
of description, carcinogenesis may be regarded as proceed- 
ing through distinct stages, which can be defined by histolog- 
ical grades, functional changes, or genetic alterations. In Sec- 
tion III, these stages will be associated with the occurrence of 
a certain number of mutations. We present expressions of the 
waiting time until a given stage is reached, for different models 
of mutation. Finally, in Section IV, we analyze an evolutionary 
model of carcinogenesis explicitly describing the appearance 
of genetic alterations in the tissue by mutations in single cells 
and their subsequent clonal expansions. 



II. MULTISTAGE THEORY 

The multistage theory of cancer postulates that tumorigene- 
sis is a linear multistep process, in which each step from one 
stage to the next is a rare event (Figure 1). Let us denote the 
cancer stages by 0, 1, 2, . . . , A;, where stage refers to the nor- 
mal precancerous state, 1 to the first adenomatous stage, and k 
to a defined cancerous endpoint, such as the the formation of 
metastases. The process is started at time f = in state 0. 

The transition rate from stage j — lto stage j is denoted Uj. 
That is, the waiting times for the transitions to occur are as- 
sumed to be independently exponentially distributed. Here, the 
coefficients Uj denote the transition rates between the stages of 
tumor development; later we will link them to different models 
of mutation and fixation. Because of the sequential nature of 
the linear model, the waiting time Zk until stage k is reached is 
given recursively by the sum of exponentially distributed ran- 
dom variables, 

Tl~Exp(Ml), T; ~ Ty_l +Exp(My), j^2,...,k. 

(1) 
The waiting times follow the linear order ti < ■ ■ ■ < r^. and 
the expected waiting time is 
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In particular, if all transition rates are equal, Uj — u for all 
i — 1, . . . , fc, we find V\zk\ — k/u. Hence the waiting time 
scales linear in the number of transitions k. 

Let /r,,...,rj(fi, ■ ■ • , f*:) be the density function of the joint 
distribution of waiting times r — {t\, . . . ,Tk) defined by 
Eq. (1). The linear order of the waiting times xj induces the 
factorization 

k 

fn,...,nitu ...,tk)^Y[ f^jUj-i^'J I O-i) (3) 

of /ri,...,rt into the conditional densities 

fTj\r,-iitj I tj-i) = Ujexp{-Uj[tj -tj-i]) litj > tj_i), 

(4) 

where I is the indicator function. 

Multistage theory can also be formulated as a system of or- 
dinary differential equations (ODEs). We derive this formula- 
tion as follows: Let x jit) denote the probability that stage j is 
reached before time f > 0, but stage /' + 1 has not yet been 
reached, 

xo{t) — Prob[0 < t < Ti] 

Xjit) = Prob[T; < t < Tj+i], j ^l,...k-l, (5) 

Xkit) — Prob[Ti, < t]. 

We have xo{t) + ■ ■ ■ + Xkit) — 1 and JC/(f) — Prob[r < 
Tj_i_i] — Prob[f < Tj] due to the lineaiity of transitions. It fol- 
lows that Xj{t) — /r +1 (0 — ftj (0-Using the conditional ex- 
ponential nature of the model, Eq. (4), one finds that /r (f ) — 

/o~ /r,.r,_, {t, t') dt' = /o' exp(-M,[f - f'])/,^._, it') dt'. From 
the identity exp(—Ujt) — Uj J^ exp(—Ujt') dt', one obtains 

/OO Pt 
j exp{-ui[t"-t'])f,^_,it')dt'dt" 

/OO Pt 
J /.,,.,_, (f",f')df'df" (6) 

= M^-Prob[Ty_i < t < Zj] — UjXj-i(t). 

Hence, the probabilities Xj (f ) obey the set of ODEs, 

X0{t) — -UlXQ{t), 

Xj(t) = ujXj-i(t)-Uj+iXj(t), j ^l,...,k-l, (7) 

Xkit) = UkXk-l{t), 

subject to initial conditions jco(O) = 1 and Xj(Q) — for all 
j > I. These rate equations describe the linear chain of ex- 
ponential waiting time processes as a probability flux of rate 
UjXj-i{t) from state /' — 1 to state j. 

If all rates are identical, u j — u for all j , then the solution 
of this linear system of ODEs is given by Poisson distributions 
with time-dependent parameter ut, 

{uty exp(— Mf) 
Xj{t) = Pois(;;Mf) = , ;■ =0, ...,^- 1. 

(8) 



The probability of having reached the final stage of progres- 
sion, k, at time t is 

xt(t) = 1 - e-"' ^ —^ = Pois(A:; ut) ^ 77—^. (9) 
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We also recover from the ODE system the expected waiting 
time to the final cancer stage. 
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Multistage theory provides a mathematical framework for 
describing the stepwise progression of cancer. For the above 
discussion of the model, we have neither specified the defini- 
tion of the postulated stages, nor the nature of the transitions. 
Indeed, different interpretations and uses of the model are pos- 
sible. In the following we link multistage theory closely to the 
genetic progression of cancer. 



III. GENETIC PROGRESSION OF CANCER 

In this section, we associate the stages of tumorigenesis to 
mutations in the genomes of cancer cells. Each stage is defined 
by the number of mutations that have accumulated in the cells 
of the tissue. Each mutation occurs initially in a single cell as 
the result of an erroneous DNA duplication. Some mutations 
alter the behavior of the cell in such a way that it experiences a 
growth advantage relative to the other cells in the tissue. These 
cells can outgrow their competitors in a clonal expansion and 
the mutation spreads in the tissue. While the first appearance 
of a mutation is essentially a random process, i.e., each mu- 
tation is equally likely to appear, the fate of a mutation in the 
population depends on the relative fitness of the cell in which 
it occurs. 

We define tumor progression to be in stage j of the multistep 
model, Eq. (1), if most of the tumor cells harbor exactly /' mu- 
tations. We are interested in the waiting time until k out of d 
possible mutations have accumulated, where typically k ^ d. 
For example, (Sjoblom et al., 2006) suggest that k '^20 genes 
out of d ^ 100 to 1000 need to be hit in order to develop 
invasive colon cancer In this interpretation of multistage the- 
ory, stages correspond to population states and transitions cor- 
respond to genetic transformations of the ensemble of tumor 
cells, including mutation and selection. These population dy- 
namics will be investigated in more detail in the next section. 
The focus of the present section is on how different models of 
accumulating mutations affect the waiting time. 

Genetic mutations occur randomly at erroneous cell divi- 
sions, but the subsequent fixation of the mutation within the 
cell population is constrained: A mutation will only spread if 
it confers a growth advantage. This restricts not only the mu- 
tations driving cancer, but also the order in which they can 
appear, because some physiological changes must be achieved 
before others. For example, in colon cancer, cells must lose the 
ability to undergo apoptosis, before additional mutations accu- 
mulate in the resulting neoplasia. Hence loss of function of the 



tumor suppressor gene APC controlling apoptosis is necessary 
before other mutations such as KRAS2 can fixate (Nowak et al., 
2003; Vogelstein and Kinzler, 2004). Furthermore, sometimes 
only the combined action of mutations drives cancer progres- 
sion. It is known, for example, that only the combination of 
p53 and Ras trigger the development of tumors in mice (Land 
etal, 1983). 

In general, there may exist several order constraints for the 
successive fixation of mutations as shown in Figure 2. The 
simplest of these constraints is the linear model, where the 
waiting times of all mutations are totally ordered (Figure 2(a)). 
The linear model is exactly the multistep process discussed in 
the previous section. Alternatively, mutations may occur inde- 
pendently without any constraints (Figure 2(b)). If there exist 
order relations among some of the possible mutations, a partial 
order may be used to describe the process of accumulating mu- 
tations (Figure 2(c)). We will discuss these models separately 
and show how the topology of the genotype space affects the 
waiting time. 

Let d be the number of possible mutations and denote by Tj 
the waiting time for mutation j (j — 1 , . . . , c/) to be generated 
and to establish in the tumor. The waiting times Tj are as- 
sumed to be exponentially distributed with parameters Ij, and 
they obey certain temporal order constraints (Figure 2). The 
joint distribution of T — (Ti, . . . , Td) determines how long 
it takes until k out of the d mutations have accumulated. We 
define the random variable T,t denoting stage k as the waiting 
time until any k mutations appear, 

Ti: = min ma\{Tj^,...Tj^}, [d] ^ {1,2, . . . ,d}. 

(11) 

If all mutations accumulate in a linear order (Figure 2(a)), each 
at rate 1 j , 

ri~Exp(Ai), T/ ~ r,_i +Exp(l/), j^2,...,d, 

(12) 
then Tic — Tk and the process of mutation and clonal expansion 
is mathematically equivalent to the general linear multistep 
process of Eq. (1). In this case, the transition rates u j — Ij 
may be interpreted as an effective rate for the mutation and 
the clonal expansion process. According to Eq. (2) the waiting 

time for A; < d mutations is given by E[Tk] — 2];=i 1/'^/ ^'^'1 
the waiting time scales linear with the number of mutations. 



3.1. Independent mutations 

Let us now consider the situation where all mutations may 
occur in an arbitrary order (Figure 2(b)), 

Tj ~ Expilj), j ^l,...,d. (13) 

If ^ = 1, Eq. (11) simplifies to 

Ti = min{ri, . . . , 7;,} ~ Exp(ii -\ \- Id) (14) 

and the expected waiting time is E[ti] — l/X!/=i ^j- If ^H 
fixation rates are equal to 1, then ]E[ti] — l/(dX). Thus, the 
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FIG. 2 Conjunctive Bayesian networks. Displayed are the Hasse dia- 
grams of the posets representing the graph of the underlying Bayesian 
network (left) and their corresponding genotype lattices (right) for 
d — 3 mutational events. In the Hasse diagrams, each directed edge 
i — > j denotes a direct dependency between two mutational events 
i -< j; each node j is associated with an exponential waiting time 
process with parameter X ,- conditioned on the prevalence of all mu- 
tations with directed edges to j . The genotype lattice is the lattice of 
order ideals of the poset. It consists of all genotypes that are com- 
patible with the relations of the poset. A directed edge S — > T* is 
drawn between two genotypes S, T (Z [d], if T arises from 5 by a 
mutation j e T and if the transition from S to 7 is consistent with 
the poset, i.e. there exists no i -< j with ; ^ 5. Mutations accu- 
mulate along the different paths from to [d] in the genotype lat- 
tice. Three network topologies are shown: a linear chain of mutations 
(a), independent mutations (b), and a poset in which both mutation 
1 and mutation 2 need to occur before mutation 3 (c). These posets 
induce different genotype lattices, namely a single path (a), the com- 
plete d-dimensional hypercube (b), and a lattice of intermediate size 
(c), respectively. 



occurrence of any one out of d mutations is equivalent to a 
1-step process at rate mi — dl. 

For now, we continue assuming identical rates 1. \f k > 2 
and the first mutation has occurred, then there are (i— 1 choices 
left for the second mutation to occur, hence xj ~ ti +Exp((t/— 
1)1). In general, the accumulation of k out of d mutations, 
which occur independently at the same rate X, is equivalent to 
the ^-step process, Eq. (1), with rates Uj — id — j + 1)1. From 
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If many mutations are possible, d ^ k, then the expected wait- 
ing time for k mutations is approximately E[Tk] ^ k/(dX), 
which is smaller than 1/1 and linear in k. The waiting time 
approaches zero in the limit d —^ oo for every fixed k, be- 
cause the exponential distribution is non-zero at f = 0. On 
the other hand, if k = d, all possible mutations need to occur 
and E[t^] = Hk/1, where H^ = Xy=i V; is the k-th har- 
monic number Using Hk ^ y + log k, y ^ 0.577 being the 
Euler-Mascheroni constant, we find an approximate logarith- 
mic dependency for the occurrence of all possible mutations, 
]E[ri] ^ (y + \ogk)/l. In contrast to the k <^ d case, for 
k — d, the expectation of n- is larger than 1/1 and increases 
only logarithmically in k. 

In both cases the expected waiting time is always larger if 
mutations can only occur in a linear fashion, Eq. (2), than if 
mutations are independent, Eq. (15). This is due to the fact that 
in the independent case, all mutations are possible in any step 
of the process, whereas in the linear case, only one mutation is 
feasible at each stage. 

The ODE system, Eq. (7), corresponding to the multistage 
model with unequal transition rates u j — (d — j + 1)1 has also 
an analytical solution. For j — I, . . . ,d, 
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\f k <$C d, then Uj ^ dl and Eq. (8) yields Xj{t) ^ 
Pois(j; dlt). Thus, the number of independent mutations ac- 
cumulates at a speed that is roughly d times faster than for a 
linear chain of mutations. 

We now turn to the case of independent mutations with ar- 
bitrary fixation rates Ij. The distribution of ri is given by 

Eq. (14) with expected value 1/ 'Zfj=i ^j- If ^ > 2, then for 
the second mutation there are li — 1 choices. However, the 
rate at which the second mutation occurs now depends on the 
specific realization of the first mutation. Hence, we have to 
consider the set Ck of all total orderings 



< Ti, 



(17) 



of k out of d waiting times. There are {d)k — d\/{d — k\) such 
orders. We identify Ck with the set of all mutational pathways 
y'l — > ■ ■ ■ — > jk of length k in 2^'^^. For notational convenience, 
we write such a path C e C,t as a collection of subsets C — 
(Co, Ci, . . . , Ck) such that Cq — & and C/ — Uj,^j{jf}, for 
i — 2, . . . ,k. Each set C, represents an intermediate genotype 
on the path with / mutations. 

The expected waiting time until any k out ofd mutations oc- 
cur is the weighted sum over all mutational pathways of length 
k. 
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where 
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is the probability of pathway C with [jj] — C/ \ C,_i and 
Exit(C,_i) — [d]\Ci-i being the set of all possible mutations 
at step i . Furthermore, 
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(20) 



is the expectation of the waiting time Zk given that the path 
C is realized. For a fixed pathway, say 1 ^ ■ ■ ■ —^ k, the 
waiting time distribution is Exp(/li + ■ ■ ■ + aj) for the first 
mutation,, Exp(/l2 + ■ ■ ■ + ^d) for the second mutation, and 
Exp(/ij + ■ ■ ■ + Ad) for the j-th mutation. In general, Eq. (20) 
arises from a linear A:-step process, Eq. (1), with transition rates 
u j — XfeExitfc _r) ^C- Note that this waiting time is differ- 
ent from the waiting time in the linear model, because here 
a linear pathway is considered within a much larger lattice 
of mutational patterns (Figure 2(b)). In the denominators of 
both Eqs. (19) and (20) we account for alternative evolution- 
ary routes by summing over the fixation rates of all mutations 
that could have occurred at this point. 

If all fixation rates are identical to 1, then Prob[C] — l/{d)k 
and E[tj: | C] = XLi VC-^ - i + 1)^ are independent of C, 
and we recover Eq. (15). 



3.2. Partially ordered mutations 

Sequentially and independently accumulating mutations can 
be regarded as two opposite extreme cases, where the linear 
model imposes maximum constraints on the order in which 
mutations can occur, while the independent model imposes 
none. For most biological systems, including cancer progres- 
sion, we expect more realistic models to lie somewhere in be- 
tween these extremes (Figure 2(c)). Conjunctive Bayesian net- 
works are a class of waiting time models that allow for partial 
orders among the mutations, i.e., they encode constraints like 
Ti < Tj for some of the mutations (Beerenwinkel and Sulli- 
vant, 2008; Beerenwinkel et al, 2006, 2007b). 

Formally, the (continuous time) conjunctive Bayesian net- 
work is defined recursively by a partially ordered set, or poset, 
P — {{d~\, -<) and fixation rates Xj, as 



Tj — { max r,} +Exp(ly), 

'epa(7) 



j^\,...,d, (21) 



where pa(7) = {i \ i -< j and {i^(^j^>i — (art — 
i)} is the set of mutations that cover mutation j . This model 
class includes the linear and the independent model, for which 
|pa(j)| — 1 and pa(y) — 0, respectively. It is a Bayesian 
network model, because the joint density ofT — {Ti,..., Td) 



factors into conditional densities as 
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1 = 1 

(22) 

The expected waiting time until k mutations have accumulated 
according to the partial order P can be calculated in a fashion 
similar to Eq. (18). Let J(P) C 2^"^^ denote the set of all geno- 
types that are compatible with the poset P, i.e., the subsets 
S C [d] for which /' 6 S and / -< j implies / e S. Consid- 
ering the set Ck{P) of all mutational pathways of length k in 
y(P), wefind 



CeCt(P) 



C]Prob[C], 



(23) 



with Prob[C] and E[ta. | C] defined in Eqs. (19) and (20), 
respectively. The set of possible paths Ck{P) is restricted to 
the lattice J{Py, hence the set of possible next mutations, 
Exit(C,), is also constrained to the elements compatible with 
the poset P . The expected waiting time for k mutations grows 
with the number of relations because Exit(C/) is the larger, the 
less relations exist in the poset. It is therefore maximal in a 
totally ordered set, then decreases for a partial order, and is 
minimal for an unordered set. The two opposing limit cases 
of the linear chain and the independent case represent extrema 
also in terms of the expected waiting time. 

In practice, the number of mutational pathways can be large, 
but the expectation, Eq. (23), can be computed recursively 
without the need of enumerating all paths. The conjunctive 
Bayesian network does not only allow for calculating the ex- 
pected waiting time, but it has also nice statistical properties. 
Both the parameters 1 j and the structure P of the model can 
be inferred efficiently from observed data. The maximum Uke- 
lihood estimator for the parameters is 

M 



i/ 



(24) 



Z, = l(f0-max;epa0)ff;) 

where M is the number of observations and the /-th obser- 
vation ti. is a realization of T — (Ti, . . . , Td). The maximum 
likelihood poset P is the maximal poset that is compatible with 
the data. In other words, P is simply the poset that contains all 
compatible relations. 

In practice, the occurrence times of mutations, Tj, may not 
be observable, but instead only mutational patterns are avail- 
able. This setting gives rise to a censored version of the 
conjunctive Bayesian network, in which parameter estimation 
is still feasible using an Expectation-Maximization algorithm 
(Beerenwinkel and SuUivant, 2008). 



IV. POPULATION DYNAMICS 

In the previous section we have treated genetic progression 
as an effective process with steps including both mutation and 



clonal expansion that occur at effective rates 1 j . We will now 
dissect these two processes and analyze models with explicit 
mutation and proliferation. Let fj. denote the mutation rate. 
We assume that each mutation increases fitness by the same 
amount s in a multiplicative manner such that the fitness of a 
cell with j mutations is (1 + s)-' . Before analyzing the system 
with both mutation and selection we first discuss this model for 
s — 0. This corresponds to an ensemble of A^ independently 
and identically distributed copies of the waiting time process. 
For example, such a situation is found in the colon: It consists 
of more than 10^ crypts (Humphries and Wright, 2008), each 
of which can develop an adenoma independently. The model 
also applies to the case of selectively neutral mutations in a tis- 
sue and we will present expressions for the waiting time until 
the first cell has accumulated a given number of mutations. 

4.1. Independent cell lineages 

If s — 0, all cells have the same replicative capacity ir- 
respective of their mutational patterns. Mutations therefore 
accumulate independently in a neutral evolutionary process. 
We can analyze this process by interpreting the independence 
model with rates Ij — ju as describing the state of a single 
cell. The population of genetically heterogeneous, but pheno- 
typically identical cells can then be regarded as an ensemble of 
independent cell lineages, each evolving according to Eq. (13). 
In this setting, we are interested in the average time it takes 
until the first cell with k mutations appears in a population of 
size A^, i.e., in the expectation of minjr^ ' | / — 1, . . . , A^}, 
where all r^ are identical distributed according to Eq. (1) with 

Uj ^ (d- j + 1)m- 

Rather than calculating this expectation, we take a different 
approach. Let z^ be the waiting time for k <^ d independent 
mutations, which is equivalently defined by the linear process 
with rate fid, Eq. (15). In an ensemble of many identical cell 
lineages the probabilities x j(t) — Prob[Ty < f] may be iden- 
tified with the relative abundances of cells with j mutations in 
the population. Similarly, Prob[Tjt <t] — X/xt Pois(y'; fxdt) 
is the fraction of cells having at least k mutations. When this 
fraction exceeds 1/A^, chances are high that the first cell has 
accumulated k mutations. Thus, we define 



X* =inf{f >Q\xk{t) > l/N]. 



(25) 



This quantity can also be interpreted as the (l/A^)-quantile of 
the distribution of Xk- Using Eq. (9), we can find x^ by solving 



— — Pois(A:; ^dt) y^ 



(iudt)J 



(26) 



for t. Since A^ is typically very large (A^ = 10^ to 10^ cells), 
we are searching for solutions in the regime where the right 
hand side of Eq. (26) is small. This is the case for judt <^ k. 
Then only the y = term of the sum contributes appreciably 
and we have to solve 1/A^ — Pois(fc; judx^). 

For ^ = 1, we consider the subset of 1-cells, i.e., cells con- 
taining one mutation, that starts growing in the background 



of mutation-free cells as xi(t) — /idt exp(—/idt) ^ fidt for 
f Ri 0. Thus the average waiting time to the appearance of 
the first cell with one mutation is t,* 



\/(fidN). Simi- 
larly, for A: = 2, we find X2(f) — (I /2)(fidt)^ exp{— fidt) ^ 
(l/2)(fidt)^ and thus X2(t) — l/N has the approximate so- 
lution X2 ^ y/2/{fid\/~N). Alternatively, one can arrive at 
this approximation by considering the initial linear growth of 
the population of 1-cells. The first 2-cell is produced by these 
growing 1 -cells when 



^2 1 

fid j xi{t)dt — 
/o 



A^' 



(27) 



having the same approximate solution given above. 

In general, the solution of 1/A^ — Pois(A:; fidx^) is given 
in terms of the Lambert W function, which is defined as the 
solution of W{z)e^^^^ — z. 



'k - 



/id 



Wo 






(28) 



where Wo is the principle branch of W (Corless et al, 1996). 
For large population sizes A^, the argument of the Lambert W 
function in Eq. (28) is close to zero and hence W{z) ^ Z- We 
obtain 



^ 



fidNy^' 



for all A; > 1, 



(29) 



which generalizes the approximations for Tj* and Tj given 
above. 

On the other hand, for large k, we have A:!'/*^ ^ k/e and 
f^i/k ^ I _^ (log A?)/;t leading to 



(30) 



e/id{k + \ogN) 



This approximation is less accurate, but reasonable for usual 
parameter values and A: = 0, . . . , 20 (Figure 3). For A^ = 1, it 
coincides with the result for a single cell line, Eq. (15), up to a 
constant factor of 1/e. The waiting time depends on the the in- 
verse of the logarithm of the population size. For example, the 
average waiting time to the first cell with k mutations among 
10^ cells is only about 20 times shorter than the same wait- 
ing time in a single cell. For large k, this expression becomes 
again linear in k (Figure 3). 

The normal mutation rate due to DNA polymerase errors is 
on the order of 10"^" to 10~^ base pairs (bp) per cell per gen- 
eration (Kunkel and Bebenek, 2000). For an average human 
gene size of 27kbp (Venter et al, 2001), the mutation rate per 
gene should be on the order of fi ^ 10~^ per cell per gener- 
ation. The waiting time until the first of 10^ cells has accu- 
mulated A: = 20 mutations would be on the order of 10^ cell 
generations which, in turn, typically occur at the time-scale of 
days or weeks. Thus, the waiting time would be on the or- 
der of 10^ days or more, clearly exceeding a human lifetime. 
Hence a neutral evolutionary process alone cannot account for 
the genetic progression of cancer. 
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FIG. 3 Approximate solutions of the waiting time zT as defined by 
the equation XjN = Pois(fe; fidz^), for N - 10^, ^d = 0.001, and 
k = 1 , . . . , 20. The exact solution in terms of the Lambert W function 
(filled circles, Eq. (28)) is compared to the approximation given in 
Eq. (29) (squares) and the less accurate but simpler approximation of 
Eq. (30) (triangles). 



4.2. Selection and clonal expansion 

We now analyze the dynamics of an evolving cell population 
in which each mutation confers the same selective advantage 
s > Q. Because a new mutant with an additional mutation 
has a growth advantage, it will expand in the tissue and out- 
compete the other cells. The next mutation is most likely to 
occur on this growing clone. We therefore use an evolutionary 
model of carcinogenesis that accounts for mutation and selec- 
tion (Beerenwinkel et al, 2007a) and trace the number of cells 
with j mutations, Nj{t), in each generation f = 0, 1, 2, . . . . 

Consider a population of A' cells that undergo subsequent 
rounds of cell divisions as shown in Figure 4. In each gen- 
eration, mutations occur randomly and independently at rate 
fi. The total number of possible mutations is denoted d. We 
assume that fitness, i.e., the expected number of offspring, is 
proportional to (1 +sy , where j is the number of accumulated 
mutations. The population dynamics are assumed to follow a 
Wright-Fisher process (Ewens, 2004). In this model, genera- 
tions are time-discrete and synchronized. A new configuration 
[A^o(f + 1), • • • , Nd{t + 1)] of cells is drawn from the previous 
generation t according to the multinomial distribution 

Prob [Noit + 1) = no, • • • , Nd{t + 1) = rid] 

d 



(no+ ■■■+«</)! fr^n; (31) 



no! ■■ -Wrf! 



/ = ! 





t = Q 



t = 1 



t = 2 



t = 3 



FIG. 4 Wright-Fisher process. Illustrated are four generations of a 
single realization of the Wright-Fisher process with population size 
A' = 6. In each generation / -I- 1, cells are drawn randomly from 
the previous generation / according to the multinomial distribution 
given in Eq. (31). Directed edges (— >) indicate the genealogy of this 
realization. In general, cells with more mutations are more likely 
to generate offspring and will therefore, on average, outcompete cells 
with fewer mutations. In each generation, cells are subject to mutation 
(~^). In this realization, the waiting time to the first appearance of a 
cell with k = 2 mutations was T2 — 3 generations. 



probability of sampling a y'-cell. 






i-'(l-i,f-i^pl^, (32) 



Z/(i + ^)'-^/^ 



where we defined Xjit) — N j{t)/N as the relative abundance 
of y-cells. Acell withy mutations can occur in generation f-|-l 
either as progeny of a y'-cell in generation t, or by erroneous 
duplication of a (7 — l)-cell. For .s = and infinitesimal gener- 
ation times, the model reduces to the case of independent cell 
lineages undergoing independent mutations, which has been 
discussed in the previous section. 

In general, no closed form solution of the Wright-Fisher 
process is known. However, the dynamics defined by Eqs. (31) 
and (32) display certain regularities that can be exploited in 
order to derive an approximate analytical expression for the 
expected waiting time to the first cell with k mutations, r^. 
Numerical simulations show that the subsets of y'-cells se- 
quentially sweep through the population and the mutant waves 
travel at constant speed (Beerenwinkel et al., 2007a). This reg- 
ular behavior can be analyzed by decomposing the process into 
the generation of a new cell type by mutation and its clonal ex- 
pansion driven by selection. 

The dynamics of clonal expansions are given by the replica- 
tor equation (Nowak, 2006), 



Xj{t) ^SXj{t) 



- ^ / Xi (f ) 



(=1 



(33) 



where n 







Ilk — N. The parameters Bj denote the 



where we consider only those cell types that are already 
present in the system and we ignore mutation. The fitness of 
7-cells is (1 + s)J Ri 1 + js, if s <^ 1. Eq. (33) has a solution 
in terms of the Gaussians Xjit) — Aexp(— [y — vt]^/[2a^]) 
with normalization constant A and width a^ — v/s, where 



V is the velocity of the traveling wave. The initial growth of 
a newly founded clone is exponential, but eventually follows 
this Gaussian distribution. The final decline corresponds to the 
clone ultimately becoming extinct by outcompetition of fitter 
clones harboring additional mutations. 

The velocity of the waves is determined by the mutation pro- 
cess. A new (j + l)-cell is generated by mutation from the 
growing clone of y-cells. The equation Xj^i(t) — \/N can 
therefore be rewritten, similar to Eq. (27), as 



;+i 1 

iudxj{t)dt = — -, 
^ 



(34) 



where initially, Xj grows exponentially according to Eq. (33). 
This approach finally yields the approximate expected waiting 
time (Beerenwinkel et al., 2007a) 



k\og\s/{fid\) 
2s log A^ 



(35) 



This expression suggests approximating the Wright Fisher pro- 
cess, Eqs. (31) and (32), by a linear multistep process, Eq. (1), 
with transition rate u — {2s\ogN)/\o^{s/[fid\) in which 
stages correspond to clonal expansions (Maley, 2007). Com- 
paring Eq. (35) with the waiting time in a neutral evolutionary 
process, Eq. (30), here the waiting time per mutation, \/du, 
contributes only logarithmically and the expected waiting time 
Eq. (35) is proportional to k/s, reducing the overall waiting 
time considerably. The reason for this acceleration lies in the 
growth advantage of the mutated cells: A single cell produces 
an exponentially growing number of clonal offspring. This 
growth, in turn, directly relates to the probability of creating 
a cell with an additional mutation. Therefore, clonal expan- 
sions dramatically speed up the accumulation of mutations in 
a population. 

For example, considering a fitness advantage of s — 10~^ 
per mutation, li = 100 susceptible loci, a mutation rate of m = 
10~^ per gene, and a population size of N — 10^ cells results 
in a waiting time of Tjq ^10^ generations. With a generation 
time of 1 to 2 days, this waiting time ranges on the time scale of 
several years, being consistent with clinical observations. By 
contrast, the waiting time in the neutral model is on the order 
of 10^ generations. Hence even a moderate selective advantage 
decreases the waiting time by three orders of magnitude. 

The time r t denotes the time after which the probability that 
a cell with j mutations has been generated exceeds 1/A^. This 
is an approximation for the expected waiting time of the first 
7-cell with an additional mutation in a population of size A^. 
For the Wright-Fisher process it is known, however, that due 
to genetic drift the probability of fixation of a selectively ad- 
vantageous mutation initially present in a single cell is only 
2s (Ewens, 2004). Hence, the majority of mutated cells be- 
come extinct. This is also observed in the numerical simu- 
lations of the Wright-Fisher process (Eqs. (31, 32); Beeren- 
winkel et al. (2007a)). On average it takes l/2s cells un- 
til the first successful mutant is generated. This effect is in- 
cluded indirectly in approximation Eq. (35): xj(t) oc e*' is 
the expected frequency conditioned only on Xj(0) — l/N and 



not on survival. It also accounts for all trajectories, includ- 
ing those going extinct. Recently, this effect has been stud- 
ied in a related model (Brunet et al., 2008; Desai and Fisher, 
2007). Desai and Fisher (2007) found an approximate waiting 
time of T^* «i klog{s/[ud])/(s[2logN + log{sud}]). Compar- 
ing with expression Eq. (35), the only difference is the term 
log(sud) in the denominator. For typical parameter values, 
log(sud) « — 7 Ri — logA^. Therefore, the waiting time is 
larger by a factor of 1.5 as compared to Eq. (35). But this 
comparison is limited, because the models are not identical. 
For example, Desai and Fisher (2007) obtain a fixation proba- 
bihty of s, whereas in the Wright-Fisher model it is 2s. 



V. CONCLUSION 

A quantitative understanding of cancer progression is re- 
quired for constructing clinical markers and for revealing rate- 
limiting steps of this process. Here, we have analyzed waiting 
time models for carcinogenesis and solved the equations defin- 
ing the expected waiting times. Similar quantities have previ- 
ously been shown to measure the degree of tumor progression 
and to predict survival in cancer patients (Rahnenfiihrer et al., 
2005). 

In the simplest case, carcinogenesis may be described by a 
linear multistep process. The progression stages are generally 
described by histological alterations and functional changes, 
or on a molecular level, by mutation of certain genes and sub- 
sequent clonal expansion. In a general multistep process, the 
overall waiting time to reach stage k is the sum of the waiting 
times of all predecessing steps. 

If tumor stages are defined by the number of mutations that 
have fixated in the cell population, then the progression dy- 
namics depends on the order in which mutations accumulate. 
For example, mutations may accumulate in a linear fashion, 
according to a partial order, or completely independently. Lin- 
ear accumulation is the slowest and independent progression 
is the fastest. The acceleration can be considerable, especially 
if many mutations are available that drive carcinogenesis. 

The linear and the independent model present opposing lim- 
its of the conjunctive Bayesian network family of models, in 
which the mutations obey a partial order The relations of the 
poset may result from causal relationships among mutations, 
such as the requirement in colon cancer for the tumor suppres- 
sor APC to be mutated before other mutations are beneficial 
and can fixate. The poset constraints induce a subset of muta- 
tional pathways in the hypercube representing all combinato- 
rial genotypes (Figure 2). Because the waiting time is additive 
in the mutational pathways, its expected value for the conjunc- 
tive Bayesian networks ranges between those for the linear and 
the independent model. 

We have discussed a particular instance of the Wright-Fisher 
process, an evolutionary model comprising mutation and se- 
lection. In this model, we find waiting times on the order of 
20 years for a normal mutation rate and a selective advantage 
of 1 % per mutation. The successive clonal waves might be re- 
garded as the stages in classical multistage theory. A neutral 
evolutionary process can not explain the clinical progression of 



colon cancer, in which about 20 out of hundreds of mutations 
accumulate in a time frame of 5 to 20 years. This process may 
only be explained by advantageous mutations giving rise to 
clonal expansions. These selective sweeps drastically increase 
the chances of acquiring additional mutations the spreading 
offspring. 
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